State of Data Science and Machine Learning

In this article, we visualize the data available from the Kaggle survey in three consecutive years (2017, 2018, and 2019). The results include raw numbers about who is working with data, what’s happening with machine learning in different industries, and the best ways for new data scientists to break into the field. We've published the data in as raw a format as possible without compromising anonymization, which makes it an unusual example of a survey dataset.

Loading the data

Preprocessing

Renaming Columns

Droping Columns

Country

Continent

Age Group

Formal Education

Current Salary

Exploratory Data Analysis

Responses by Years

A quick comparison between the number of responses by year shows that the number of responses in 2018 is the highest.

Responses by Countries

It can be seen that each year, the highest number of responses are from India and the United States.

Responses by Gender

It can be seen that each year, the majority of the participants are men. This graph can be specified by the country as follows.

Responses by Continent

The number and percentage of the participants can be analyzed by continent as well.

Responses by Age Groups

Responses by Education

Current Job Title

Current Salary Range

Activities

Media Sources

Data Science Courses

Integrated Development Environments

Notebook Host

Programming Languages

Visualization Libraries

Specialized Hardwares

ML Algorithms

ML Tools

Computer Vision Methods

Natural Language Processing (NLP)

Machine Learning Frameworks

Cloud Computing Platforms

Cloud Computing Products

Big Data / Analytics Products

Machine Learning Products

Automated Machine Learning Tools

Automated Machine Learning Tools


References

  1. 2017 Kaggle Machine Learning & Data Science Survey
  2. 2018 Kaggle Machine Learning & Data Science Survey
  3. 2019 Kaggle Machine Learning & Data Science Survey
  4. 2020 Kaggle Machine Learning & Data Science Survey